Modelling the dynamics of multiagent Q-learning with ε-greedy exploration
نویسندگان
چکیده
We present a framework to model the dynamics of Multiagent Q-learning with -greedy exploration. The applicability of the framework is tested through experiments in typical games selected from the literature.
منابع مشابه
Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration
Q-learning in single-agent environments is known to converge in the limit given sufficient exploration. The same algorithm has been applied, with some success, in multiagent environments, where traditional analysis techniques break down. Using established dynamical systems methods, we derive and study an idealization of Q-learning in 2-player 2-action repeated general-sum games. In particular, ...
متن کاملExploration Methods for Connectionist Q-learning in Bomberman
In this paper, we investigate which exploration method yields the best performance in the game Bomberman. In Bomberman the controlled agent has to kill opponents by placing bombs. The agent is represented by a multi-layer perceptron that learns to play the game with the use of Q-learning. We introduce two novel exploration strategies: Error-Driven-ε and Interval-Q, which base their explorative ...
متن کاملExploration and Exploitation Tradeoff using Fuzzy Reinforcement Learning
Difficulty of making a balance between exploration and exploitation in multiagent environment is a dilemma that does not have a clear answer and there are still different methods for investigation of this problem that all refer to it. In this paper, we provide a method based on fuzzy variables for making exploration and exploitation in multiagent environment. In this method, an effective agent ...
متن کاملValue-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax
This paper proposes “Value-Difference Based Exploration combined with Softmax action selection” (VDBE-Softmax) as an adaptive exploration/exploitation policy for temporal-difference learning. The advantage of the proposed approach is that exploration actions are only selected in situations when the knowledge about the environment is uncertain, which is indicated by fluctuating values during lea...
متن کاملAdaptive ε-greedy Exploration in Reinforcement Learning Based on Value Differences
This paper presents “Value-Difference Based Exploration” (VDBE), a method for balancing the exploration/exploitation dilemma inherent to reinforcement learning. The proposed method adapts the exploration parameter of ε-greedy in dependence of the temporal-difference error observed from value-function backups, which is considered as a measure of the agent’s uncertainty about the environment. VDB...
متن کامل